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Abstract: We consider allocation problems that arise in the context of service allocation in Clouds. 
More specifically, we assume on the one part that each computing resource is associated to a capacity 
constraint, that can be chosen using Dynamic Voltage and Frequency Scaling (DVFS) method, and to a 
probability of failure. On the other hand, we assume that the service runs as a set of independent in- 
stances of identical Virtual Machines. Moreover, there exists a Service Level Agreement (SLA) between 
the Cloud provider and the client that can be expressed as follows: the client comes with a minimal number 
of service instances which must be alive at the end of the day, and the Cloud provider offers a list of pairs 
{price, compensation), this compensation being paid by the Cloud provider if it fails to keep alive the 
required number of services. On the Cloud provider side, each pair corresponds actually to a guaranteed 
success probability of fulfilling the constraint on the minimal number of instances. 

In this context, given a minimal number of instances and a probability of success, the question for the 
Cloud provider is to find the number of necessary resources, their clock frequency and an allocation of the 
instances (possibly using replication) onto machines. This solution should satisfy all types of constraints 
during a given time period while minimizing the energy consumption of used resources. We consider two 
energy consumption models based on DVFS techniques, where the clock frequency of physical resources 
can be changed. For each allocation problem and each energy model, we prove deterministic approximation 
ratios on the consumed energy for algorithms that provide guaranteed probability failures, as well as an 
efficient heuristic, whose energy ratio is not guaranteed. 

Key-words: Cloud, reUabiUty, approximation, energy savings 
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Algorithmes d 'approximation sur la minimisation d'energie 
pour I'allocation de services dans un Cloud 
sous contraintes de fiabilites 

Resume : Nous considerons un probleme d' allocation de services dans des Clouds. Les resources de 
calcul sont caracterisees par une probabilite de panne, et une contrainte de capacite, qui pent etre ajustee 
grace a la technique dite de Dynamic Voltage and Frequency Scaling (DVFS). II existe un contrat entre le 
foumisseur et le client, le foumisseur assurant au client qu'un certain nombre d'instances du service du 
client sera toujours en train de s'executer a la fin de la joumee, avec une certaine probabilite. La question 
est done de savoir a quelle vitesse devront tourner les processeurs, et a quel point les services devront 
etre repliques sur les machines. Nous exhibons des algorithmes d' approximation, prouvons leurs facteurs 
d' approximation sur I'energie consonnmee, et decrivons des heuristiques performantes. 

Mots-cles : Cloud, fiabilite, approximation, economic d'energie 
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1 Introduction 

1.1 Reliability and Energy Savings in Cloud Computing 

This paper considers energy savings and reliability issues that arise when allocating instances of an ap- 
plication consisting in a set of independent instances running as Virtual Machines (VMs) onto Physical 
Machines (PMs) in a Cloud computing platform. Cloud Computing pi] [l] [TO] l24ll has emerged as a 
well-suited paradigm for service providing over the Internet. Using virtualization, it is possible to run 
several Virtual Machines on top of a given Physical Machine. Since each VM hosts its complete software 
stack (Operating System, Middleware, Application), it is moreover possible to migrate VMs from a PM 
to another in order to dynamically balance the load. 

In the static case, mapping VMs with heterogeneous computing demands onto PMs with (possibly 
heterogeneous) capacities can be modeled as a multi-dimensional bin-packing problem. Indeed, in this 
context, each physical machine is characterized by its computing capacity {i.e. the number of flops it 
can process during one time-unit), its memory capacity {i.e. the number of different VMs that it can 
handle simultaneously, given that each VM comes with its complete software stack) and its failure rate 
{i.e. the probability that the machine will fail during the next time period) and each service comes with 
its requirement, in terms of CPU and memory demands, and reliability constraints. 

In order to deal with capacity constraints in resource allocation problems, several sophisticated tech- 
niques have been developed in order to optimally allocate VMs onto PMs, either to achieve good load 
balancing BOlfTTl l?! or to minimize energy consumption ||7][5l. Most of the works in this domain have 
therefore concentrated on designing offline f23\ and online f2r,'26l solutions of Bin Packing variants. 

Reliability constraints have received much less attention in the context of Cloud computing, as un- 
derlined by Walfredo Cirne in IfTTl . Nevertheless, related questions have been addressed in the context 
of more distributed and less reliable systems such as Peer-to-Peer networks. In such systems, efficient 
data sharing is complicated by erratic node failure, unreliable network connectivity and limited band- 
width. Thus, data replication can be used to improve both availability and response time and the question 
is to determine where to repUcate data in order to meet performance and availability requirements in 
large-scale systems Il36l [TSl l32l l28l l37l . Rehabihty issues have also been addressed by High Perfor- 
mance Computing community. Indeed, recently, a lot of efforts has been done to build systems capable 
of reaching the Exaflop performance (19' '201 ™d such exascale systems are expected to gather billions 
of processing units, thus increasing the importance of fault tolerance issues |12|. Solutions for fault 
tolerance in Exascale systems are based on replication strategies ll22ll and rollback recovery relying on 
checkpointing protocols lH [T3l . 

This work is a follow-up of lO, where the question of how to evaluate the reliability of an allocation 
has been addressed and a set of deterministic and randomized heuristics have been proposed. In this paper, 
we concentrate on energy savings issues and we propose proved approximation algorithms. In order to 
minimize energy consumption, we assume that we can rely on sophisticated mechanisms in order to fix 
the clock frequency of the PMs and we rely on techniques such as DVFS (see ll29l l34l fT4l |2l \T5\). In 
this context, the capacity of the PM can be expressed as a function of the clock frequency. In general, 
the probability of failure may itself depend on the clock frequency (see for instance |35|). (we will 
nevertheless not consider this case in this paper and we leave it for future works). 

To assess precisely the specific complexity of energy minimization introduced by reliability con- 
straints in the context of services allocation in Clouds, we concentrate on a simple context, that never- 
theless captures the main difficulties. First, we consider that the service running on the Cloud platform 
consists of a number of identical (in terms of requirements) and independent instances. Therefore, we 
do not consider the problems introduced by heterogeneity, that have already been considered (see for 
instance lflTl l4l). Indeed, as soon as heterogeneity is considered, basic allocation problems are amenable 
to Bin Packing problem and are therefore intrinsically difficult. Then, we consider static allocation prob- 
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lems only, in the sense that our goal is to find the allocation that optimizes the reliability during a time 
period (say at the end of the day), instead of relying on VM migrations and creations to ensure that a 
minimal number of instances of each service is running whatever the machine failures. Therefore, our 
work enables to assess precisely the complexity introduced by machine failures and service reliability 
demands on energy minimization. 

Throughout this paper, we assume that the characteristics of the applications and their requirements 
(in terms of reliability in particular) have been negotiated between a client and the provider through a 
Service Level Agreement (SLA). In the SLA, each service is characterized by its demand in terms of 
processing capability {i.e. the minimal number of instances of VMs that must be running simultaneously) 
and in terms of reliability (i.e. the maximal probability that the service will not benefit from this number 
of instances at some point during the next time period). Equivalently, the reliability requirement may be 
negociated through the payment of a fine by the cloud provider if it fails to provide the required amount 
of resources. 

In both cases, the goal, from the provider point of view, is to determine the cost of reliability, since a 
higher reliability will induce more replication and therefore more energy consumption. Our goal in this 
paper is to find allocations that minimize energy consumption while enforcing reliability constraints and 
therefore to determine the cost of reliability. This cost of reliability can then be directly translated into a 
set of (price, fine in case of SLA violation) offers by the Cloud provider 

1.2 Notations 

In this section, we introduce the notations that will be used throughout the paper Our target cloud 
platform is made of mphysical machines A^i, A^2, • • • ,-A4rn- As already noted, we assume that Machine 
M.j is able to handle the execution of CAPAj instances of service. We also assume that we can rely on 
Dynamic Voltage Frequency Scaling (DVFS) mechanism in order to adapt CAPAj. The energy consumed 
by machine Mj when running at capacity (speed proportional to) CAPAj is given hy E — Egtatij) + 
Edyn{j), where -Edyn(j) = e^CAPA", with a > 2. This means that the energy consumed by machine 
Aij can be seen as the sum of a leakage term (paid as soon as the machine is switched on) and of a term 
that depends (most of the works consider that 2 < a < 3) on its running speed. For the sake of simplicity, 
we will assume throughout this paper that any CapAj can be achieved by Machine Mj, as advocated in 

UMIEIIEII. 

On this Cloud platform, our goal is to run (all through a given time period, say a day, as defined 
in the SLA) a service S. Dem identical and independent instances of service S are required, and the 
instances run as Virtual Machines. Several instances can therefore run concurrently and independently 
on the same physical machine. We will denote by Aj the number of instances running on Aij, that has 
to be smaller than Capa^ . Aj represents the overall number of running instances of S. In general, 
J2j -^j is larger than Dem since replication, i.e. over-provisioning of services is used in order to enforce 
reliability constraints. 

More precisely, each machine Aij comes with a failure rate FAlLj, that represents the probability of 
failure of A4j during the time period. During the time period, we will not reallocate instances of service to 
physical machines but rather provision extra instances for the service (replicas) that will actually be used 
if some machines fail. In practice, FAlLj depends on the clock frequency (although no clear consensus 
exist in the literature on how, see for instance |I9]|25]|38| ) and therefore on Capa^. As said previously, 
we will assume for the results proved in this paper that FAlLj does not depend on CapAj. 

We will denote by Alive the set of running machines. In our model, at the end of the time period, 
the machines are either up or completely down, so that the number of instances running on A4j is Aj if 
Mj € Alive and otherwise. Therefore, AliveInst — X^a^ e alive -^j denotes the overall number of 
running instances at the end of the time period and the service is running properly at the end of the time 
period if and only if J^MjEAuvE -^i — Dem. 
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Of course, our goal is not that all services should run properly at the end of the time period. Indeed, 
such a reliability cannot be achieved in practice since the probability that all machines fail is clearly 
larger than in our model. In general, as noted in a recent paper of the NY Times Data Centers usually 
over-provision resources (at the price of high energy consumption) in order to (quasi-)avoid failures. In 
our model, we assume a more sustainable model, where the SLA defines the reliability requirement Rel 
for the service (together with the penalty paid by the Cloud Provider if S does not run with at least 
Dem instances at the end of the period). Therefore, the Cloud provider faces the following optimization 
problem: 

BestEnergy(TO, Dem, Rel): Find the set On of machines that are on and the clock frequency as- 
signed to Machine A^j, represented by CapAj and an allocation ^ of instances tomachines A^i, A^2: ■ ■ --M 
such that (i) Vj G On, Aj < CapAj, (ii) 7'(AliveInst > Dem) > 1 - Rel, i.e. the probability that a 
least Dem instances of S are running on aUve machines after the time period is larger than the reliability 
requirement 1 — Rel, (iii) and the overall energy consumption J^jeON -E^stat(j) +ejCAPA" is minimized. 

1.3 Methodology 

Throughout the paper, we will rely on the same general approach. Through Section |2] to Section [3] and 
Section|4] we complicate the problem by considering more general problem (from the problem of assign- 
ing a fractional number of instances onto a fixed number of resources, to the problem of allocating integer 
number of instances onto a set of resources to be determined). 

In order to prove claimed approximation ratios, we will rely on the following techniques. 

For the lower bounds, we prove that for a service, given the reliability constraints of the service and 
given failure probabilities of the machines, at least a given number of instances or at least a given level of 
energy is needed. These results are obtained through careful applications of Hoeffding Bounds EtI . 

For the upper bounds, we concentrate on two special allocation schemes, namely Homogeneous and 
Step. In a solution of Homogeneous, for each service, we assign to every machine the same number of 
instances (that may be fractional or integral depending on the context), i.e. VijVj £ On, — Ai. 
In a Step solution, we authorize one unit step, i.e. VijVj € On,0 < Aij — Ai < 1. Using these two 
allocation schemes, we are able to derive theoretical bounds relying on Chemoff bounds Moreover, 
the comparison with the lower bound shows that the quality of obtained solutions is reasonably high, 
especially in the case of energy minimization and even asymptotically optimal when the size of the 
platform or the overall volume of service instances to be handled, becomes arbitrarily large. 

1.4 Motivating example 

In order to illustrate the objective functions that we consider throughout this paper and the notations, 
let us consider a service with a demand Dem = 20 and a reliability request of Rel = 8 • 10^®, that 
has to be mapped onto a cloud composed of m = 10 physical machines, whose failure probability is 
Fail = 10^^. Figure [T] depicts the various kind of solutions that we consider in this paper In terms of 
minimizing the number of instances, the best solution consists in allocating 10 instances of the service to 
the first 2 machines and 5 instances to the 8 remaining machines. Therefore, the optimal solutions allocate 
a total of 60 instances, whereas 20 instances only are required at the end of the day, in order to satisfy 
reliability constraints. The shape of the optimal solution reflects the complexity of the problem. Indeed, 
Indeed, it has been proved in |3| that even in the case with a single service and even if the allocation 
is given, then estimating its reliability is #P'-complete. The #P complexity class has been introduced 
by Valiant |39| in order to classify the problems where the goal is not to determine whether there exists 
a solution (captured by NP completeness notion) but rather to determine the number of solutions. In 
our context, the reliability of an allocation is related to the number (weighted by their probability) of 
Alive sets that lead to an allocation where all service demands are satisfied. In this example, to check 
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that the rehabihty is larger (in fact equal to) than Rel, we can observe that all configurations where at 
least 4 machines are alive are acceptable (since at least 20 instances are alive as soon as 4 machines are 
up), together with all configurations with 3 machines, as soon as a machine loaded with 10 instances is 
involved and the solution with only the first two machines alive. By counting the number of such valid 
configurations (weighted by their probability) leads to the reliability of the allocation. We can notice that 
the optimal solution involves 60 instances against around 67 for best fractional homogeneous solution, 
and 64 in the best step solution. Nevertheless, we will use fractional homogeneous and step solutions in 
order both to derive approximation algorithms and upper bounds on the number of required resources, 
and we will see that they are in general close to the optimal. 

As far as energy minimization is concerned, we can notice that if we assume a = 2, despite the bad 
load-balancing among the machines in the optimal solution for the number of instances, this solution 
remains optimal. On the other hand, if a = 3 for instance, then the best step solution consumes less 
energy than the solution minimizing the number of instances. We will prove in this paper that step and 
homogeneous fractional solutions are in fact asymptotically optimal when the overall demand, or the 
number of machines involved in the solution, becomes arbitrarily large. 



Best 
fractional 
homogeneous 



Best step 



Optimal 



Figure 1 : Motivating example 



1.5 Outline of the Paper 

As we have noticed through the motivating example, BestEnergy is in general difficult. Nevertheless, 
we prove in this paper that even when the allocation is to be determined, it is possible to provide low- 
complexity deterministic approximation algorithms, that are even asymptotically optimal when the sum 
of the demands becomes arbitrarily large. Another original result that we prove in this paper is that min- 
imizing the energy (relying on DVFS) induced by replication is easier than minimizing the number of 
replicas, whereas in many contexts (see |6| the non-linearity of energy consumption makes the optimiza- 
tion problems harder. In our context, approximation ratio are smaller for energy minimization than for 
classical replica (that would correspond to makespan or load balancing in other contexts) minimization. 

To prove this result, we progressively come to the most general problem through the study of more 
simple objective functions. Firstly, we address in Section|2]the case where we are given a single service, 
where the set of machines that are switched on is given and where the number of instances allocated to 
a machine is allowed to be fractional. Finding allocations such that the number of instances we place is 
minimum is denoted as Min-Energy-No-Shutdown problem. Then, we address in Section|3]the more 
general Min-Energy problem. For Min-Energy, the setting is the same except that the number of 
participating machines is to be determined. Finally, in Section]?] we tackle the problem of designing more 
realistic solutions, where the number of instances on each machine must be an integer In Homogeneous, 
all participating machines are allocated the same number of instances whereas in Step, the number of 
instances allocated to a machine can differ by at most one (either a or a + 1 for some value of a). 
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2 Fractional Min-Energy-No-Shutdown 
2.1 Lower bound 

Let us consider the case of a single service to be mapped onto a fixed number of machines when the 
objective is to minimize the amount of resources necessary to enforce the conditions defined in the SLA 
in terms of quantity (of alive instances at the end of the day) and reliability. The problem comes into two 
flavours depending on the resources we want to optimize. Recall that Aj is the number of instances of 
the service initially allocated to machine Aij. In its physical machines version, the optimization problem 
consists in minimizing the number of instances allocated to the different machines, i.e. J^j ^j- I" its 
energy minimization version, we rely on DVFS mechanism in order to adapt the voltage of a machine to 
the need of the instances allocated to it. In general, energy consumption models assume that the energy 
dissipated by a processor running at speed s is proportional to s". Therefore, the energy dissipated by a 
processor running Aj instances will be proportional to A" and the overall objective is to minimize the 
overall dissipated energy, i.e. J^j ■^'j- 

In order to find the lower bound, let us consider any allocation (where Aj is the number service 
instances initially allocated to machine Mj) and let us prove that if the amount of resources is too small, 
then reliability constraints cannot be met. Recall that ALlVElNSTj is the number of instances of the 
service that are alive on machine A4j at the end of the day. AliveInsTj is thus a random variable equal 
to Aj with a probability 1 — Fail and to with a probability Fail. 

Hence, the expected number of alive instances is given by E (AliveInst) = (1— Fail) J2T=i AliveInst^. 
Hoeffding inequality (see [27 J) says how much the number of alive resources may differ from its expected 
value. In particular, for the lower bound, we will use it in the following form, that bounds the chance of 
being lucky, i.e. to find a correct allocation with few instances. More precisely, it states that for all t > 0: 



P (AliveInst > E (AliveInst) + t) < exp -2=^^ — --^ 

\ ^3 = 1 A" 



Let us choose t = ^-In (1 - Rel) I]J=i -^j/^^ so that exp ^-2 ^ J j = 1 - Rel. Noting 

K' = ^M]jii^£t)^ obtain that a necessary condition on the Aj's so that the reliability constraint is 

enforced is given by (1 - Fail) ^^^li A" + sj^' ^ EJli -^j > Dem. 

As stated in the introduction of this section, we are interested either in minimizing J^j ■^j for resource 
use minimization, and J^j ■^'j for energy minimization. To obtain lower bounds on these quantities in 
order to achieve quantitative (number of alive instances) and qualitative (reliability constraints), we rely 
on Hoelder's inequality citehoelder, that states that if 1/p + 1/q = 1, then \/aj,bj > 0, J^^j^j — 
(E ap^^^, (E ^P^^''- We assume in the following that a > 2. 

With p = q = 2,aj = bj = Aj, we obtain E -^1 < (E -^j f^ so that 
(1 - Fail) EJLi A + ^K' x EJLi -^j < (l - Fail + x A - Hence a necessary con- 
dition in order to satisfy the constraints is given by E™ , Ai > ^"^^ , — = MinRep. 

J b J ^j=l 3 — i_FAiL+y^ 

Therefore, any solution that satisfies quantitative and qualitative constraints must allocate at least MinRep 
instances, whatever the distribution of instances onto machines is. 

Withp = a, l/q = (1 - 1/a), aj = Aj and bj = 1, we obtain E A < (E A")^^" m^-^/". 
Similarly, (remember that a > 2 so that a/2 > 1), with p = a/2, 1/q = (1 — 2/a), aj = A'j and 
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bj ^ 1, we obtain E < (E -4")^^" m^-^/", so that 



(I-FAIL)^^,- 



\ 



Also, we can derive another necessary condition defined as 



/ Dem \ ^ 

> — ; — = MinEnergy. 

~ Wl - p)77il-l/" + y]^TOl/2-l/a / 



Therefore, any solution that satisfies quantitative and qualitative constraints must consume at least MinEn- 
ergy, whatever the distribution of instances onto machines is. 
Note that both results hold true for a = 2. 

2.2 Upper bound - Homogeneous 
2.2.1 Min-Replication 

As explained above, in order to obtain upper bounds on the amount of necessary resources (either in terms 
of number of instances or energy), it is enough to exhibit a valid solution (that satisfies the constraints de- 
fined in the SLA). To achieve this, we will concentrate in this part on homogeneous (fractional) solutions, 
with an equally-balanced allocation among all machines (i.e. Vj, Ai.j = A). 

An assignment is considered as failed when there are not enough instances of the service that are run- 
ning at the end of the day, hence Pfau — P (AliveInst < Dem). From the homogeneous characteristics 
of the allocations, we derive that AliveInst ylx | Alive |, then P/ai; =P(|Alive| < |Alive| 
can be described as the sum of random independent variables EJLi where, for all j E {1, . . . , m}, 
Xj depicts the fact that machine A4j is alive at the end of the day {Xj is equal to 1 with probability 
1 - Fail, and to with probability Fail). 

Hence, the expected value of I Alive I is givenbyE (I Alive I) = (l-FAiL)m. Chernoff bound (see lfT6ll) 
says how much the number of alive machines may differ from its expected value. We use in this 
part Chernoff bounds rather than Hoeffding bounds because the random variables take their value in 
{0, 1} instead of {0, . . . , A} and Chernoff bounds are more accurate in this case. In particular, for 
the upper bound, we will use it in the following form, that bounds the chance of being unlucky, i.e. 
to fail having a correct allocation while allocating a large number of instances. More specifically, 
P (I Alive| < (1 - Fail - e)m) < e"^^ ™. As we want to ensure that Vfaii < Rel, we choose e such 
that e-2e''« = Rel, i.e. e = ^Kjm by noting K = Finally, we obtain a sufficient condition, 
so that the reliability constraint is fulfilled for the service Am > ^p- = MaxRep. 



Therefore, it is possible to satisfy the SLA with at most MaxRep instances of the service. Similarly, 
we can derive an upper bound of the energy needed to enforce the SLA. Indeed, with the same value of 

A, we obtain A°'m> f t--^ — , , "™ ^ = MaxEnergy. 

2.3 Comparison 

When minimizing the number of necessary instances to enforce the SLA, we obtain m^nrep ^ \ p^'^^^^/^ ' 
For realistic values of the parameters, above approximation ratio is good (close to one), since both 
^./K' ~ \l -'"(1"'^''^) 2i\\A \ — = \ ^M^*^^) are small as soon as m is large. Nevertheless, the ra- 
tio is not asymptotically optimal when m becomes large. 
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On the other hand, for energy minimization, we have 

MaxEnergy _ / (I - FAIL)mi-i/" + yF'mi/2-i/"\ " ^ 
MinEnergy 1(1- FAIL)mi-i/a - ^/iTmi/a-i/a j 

so that the ratio tends to 1 when m becomes arbitrarily large. This shows that for energy minimization, 
homogeneous (fractional) solutions provide very good results when m is large enough. In the following 
section, we prove that an allocation with a large dispersion (in a sense described precisely below) of the 
number of instances allocated to the machines cannot achieve SLA constraints with optimal energy. 

2.4 Can optimal solutions be strongly heterogeneous ? 

Above results states that for the minimization of the number of instances and for the minimization of 
the energy, homogeneous allocations provide good solutions. Nevertheless, we know from the example 
depicted in Figure [T] that optimal solutions, for both the minimization of the number of instances and 
the minimization of the energy are not always homogeneous. In the case, of energy minimization, the 
dispersion of an allocation cannot be too large, as stated more formally in the following theorem. 

Theorem 1. Let us consider a valid allocation Aj whose energy is not larger than MaxEnergy, the up- 

per bound on the energy consumed by an homogeneous allocation. Then, ifV — I ) 

is used as the measure of dispersion of the Aj ( related to the a/2-th moment of their square values), then 



Dem \ / Dem 



i-FAiL-j^y vi-p^iL+v/i^' 




.2 s a/2 



Proof Let us first introduce V = - ■ Then V >V'. Indeed, V ~ V ^ (^^) 

' J that has the same sign as I ) — I ^ ' ) that is non-negative by application of Hoelder's 



inequality. 

Moreover, we have seen that a necessary condition (see Section 2.1 1 for allocation Aj to be valid is 



given by (1-Fail) Ejli A + y'^"' x ^ Dem, what induces (1-Fail) ( minenergy 

( MINENERGY _ p.Al/a > ^ ^ jj y> ^INENERGY _ ( DEM A" • 

V V m I — ^ m y (l-FAIL)m- V-ft^m / ^ 



lently m'^V < ^^^^S^ - ^ . □ 

3 Fractional Min-Energy 
3.1 Lower bound 

Let us know consider that the number of participating machines is to be determined. In this case, we need 
to take explicitly the leakage term into account (that was considered as constant in previous section since 
the number of switched on machine was fixed). In this case, given that k E {!,..., to}, the goal is to 
minimize 

i5('°-)(fc) = fcx£;,tat + fc ' °™ ^ 



(1 - FAiL)fc + y^Kni 
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Let g be the function defined on ]0, +oo[ by g(x) = gt{x)/g2{x). Let us prove that if gd is non- 
decreasing, concave, positive, and gt is non-increasing, convex and positive, then g is convex. On the one 
hand, if gd fulfills its constraints, then g'^"' is non-increasing, convex and positive, and on the other hand, 
the product of two non-increasing, convex and positive is a convex function. 

Let us apply above lemma with gt{x) = x/x°'/'^ (which is convex since a > 2) and gd{x) = 
(1 — ¥ayl)^/x + ^/K', and deduce easily that is convex. 

Therefore,i?('°'^) admits a unique minimum on [l,m]. Since -)> +oo and £;('°*) -\-oo, 

oo 

is null at some point in [0, -(-oo[, and let us define such that (£;(i°^))' (a:^°^^) = 0, i.e. 

as 



.(i-FAiL)x(:,°:)+v^ 



(low) 



j X |^-(a-l)(l-FAiL)+(l-|) 



(low) 

(1) 



The minimum of function ijC"^) is reached on [1, to] for min(max(a;|j°^^ , 1), m). 

We can also obtain a lower bound on the energy consumption if we restrict the search to integral num- 
ber of machines. Due to the convexity of the minimum is achieved either at [min(max(x|^°^'* , 1),to)] 
or [min(max(a;^°^\ 1), to) J, so that 

E < min (£;('°-)(rmin(max(a;(;^:\ l),m)\),E('-'"\\m\n{rnB^{x^^;:\ l),m)J)) . 

3.2 Upper bound - Homogeneous 

The energy consumption of an Homogeneous solution on k machines is given by 



Dem 



(1-FAIL)- 



Let us apply again above lemma with gt{x) = d"/x" ^ and gdix) = 1 — Fail — y ^ to prove that 
£'(iip) is convex and consequently admits a unique minimum on [1, m]. Moreover, E^^p\x) — > +oo 

and i;("P) (a;) ^ +oo so that we can uniquely define x^^P^ by (E^'^p^)' (x^^P^) = 0, i.e. 



Xl-FAiL)x';^^-^Kx 



(up) 
min 



)0: 
X 1)(1-Fail) + (l 



^..= 1 , I x((a-l)(l-FAiL)+fl-f) ^ . (2) 



Therefore, we end up with the following upper bound for the energy 

E > min (^('°-)(rmin(max(x|;,°::\l),TO)l),i5('°-)(Lmin(max(x|;,'?^^ . 

4 Algorithms for the Integral Case 

In the service allocation problem in Clouds, demands represent a number of virtual machines that need to 
be allocated onto physical machines. Therefore, the number of instances allocated to each machine has to 
be an integer, and we need to adapt above results in order to obtain vahd allocation schemes. Moreover, 
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the application of Chernoff bounds enables to find valid solutions (satisfying the reliability constraints) 
and to obtain theoretical upper bounds, but Chernoff bounds are in general too pessimistic, especially in 
the case of low number of machines. Hence, we derive in this section a few heuristics that return lower 



energy than those obtained in Section 2.2 



4.1 Min-Energy-No-Shutdown 

4.1.1 Algorithms 

lower.bound In order to evaluate the performance of the heuristics, we rely on the lower bound proved 
in Section |2.1| Since this lower bound is valid even among fractional solutions, it is a fortiori valid for 
energy minimization for the integral problem. 



theo.homo This algorithm builds a valid solution following the Homogeneous policy. We have exhib- 
ited such a solution in Section 2.2 on the fractional problem. In order to enforce the reliability con- 



straint while turning this solution into an integral one, we have to round the number of instances assigned 



to each machine to the next integer, i.e. A 



, leading to an energy consumption of 



E, 



dyn 



m X 



m(l-p-A/f|) 



best.homo This heuristic finds the best solution (i.e. the one that minimizes the energy consumption) 
following Homogeneous policy. It can be decomposed into an off-line and an on-line phase; the former 
is executed once and for all, while the latter is to be run for each reliability constraint. 

In the off-line phase, we write a double-entry table, where a row is associated with a number of 
machines m and a column corresponds to a reliability requirement Rel. The value of a cell indicates 
the maximum number m' such that the probability of having m! < m alive machines among the m 
initial machines at the end of the day is not less than 1 — Rel. Those values can be obtained thanks to a 
cumulative binomial distribution. 

In the on-line phase, we perform a binary search on the machine capacity, so that we end up with a 
valid solution minimizing the energy. Obviously, this solution is the solution that minimizes the common 
capacity of the machines, and if the reliability constraint is fulfilled for a given capacity, it is a fortiori true 
for a higher capacity. At each step, for a given capacity, we just have to check, using the table, whether 
the number of alive instances is large enough. 



best.step This heuristic aims at relaxing the homogeneous constraint by finding the best solution on 
the following form: there exists Capa such that the number of instances is either Capa or Capa — 1. 
To achieve this goal, it first calls the previous best.homo heuristic that returns Capa. This ensures that 
an allocation of Capa instances per machine leads to a valid solution, whereas if we allocate Capa — 1 
instances to each machine, the reliability constraint is violated. Then, we perform another binary search 
on the number of machines that will hold Capa — 1 instances, instead of Capa. The validity of a given 
allocation is checked thanks to the dynamic programming algorithm described in |I3]. 



4.1.2 Results 

In Figure [2j we compare the performance of all heuristics under the following settings: Fail ~ 10^^, 
Dem = 500, Rel ~ 10^^, a = 2 and m varies between 1 and 600. lower.bound is depicted in red, 
best.step in pink, best.homo in blue, theo.homo in green and best.step in red. We can clearly see the 
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aliasing issue of Homogeneous solutions on integral problem: during some periods, it only increases the 
number of loaded machines without decreasing the overall capacity. 

Step solutions almost solve completely this issue and softens the best.homo curve, still staying always 
above. The ratio between the energy dissipated by best.homo and lower.bound is under 1.5 as soon as 
m > 25. 




Algorithm ■ lower.bound theo.homo best.homo best. step 



Figure 2: Simulation results for Fail = 10"^^ xiEM = 500, Rel = 10"*^, a = 2, 1 < m < 600, 
E,t.t = 50. 



4.2 Min-Energy 

When adding a non-zero static energy, all heuristics and bounds are such that the overall dissipated energy 
tends to +oo if the number of machines tends to (because of the dynamic energy) or to +oo (because 
of the static energy). There remains to find for each of them a close to the optimal number of machines 
for each algorithm. 

We have proved the convexity of the energy function returned by lower.bound. Thus, solving Equa- 
tion [T] using binary search, is enough in order to obtain the optimal to. When turning from fractional 
Homogeneous solutions to integral ones, convexity is lost and there is no easy way to find the optimal to. 
Therefore, we try all possible number of machines and keep the one that minimizes the consumed energy. 

Concerning best.homo and best.step, trying all possible number of machines would be too expensive, 
since computing the consumed energy for a given to is in general #P'-complete. As the dynamic energy 
returned by best.homo or best.step lies between the dynamic energy given by the lower and upper bounds 
of the fractional problem, the number of machines for best.homo and best.step lies between the solutions 
of Equation [T] and Equation [2] Thus, we choose m for best.homo and best.step the mean of previous 
solutions. The results for i?stat — 50 are depicted in Figure[2] 
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5 Conclusion and Open Problems 

In this problem, we have considered approximation algorithms for minimizing both the number of used 
resources and the dissipated energy in the context of service allocation under reliability constraints on 
Clouds. For both optimization problems, we have given lower bounds and have exhibited algorithms that 
achieve claimed reliability. In the case of energy minimization, we have even been able to prove that 
proposed algorithm is asymptotically optimal when the number of machines becomes arbitrarily large. 
Such a result is important since it enables, for the Cloud provider point of view, to associate a price to 
reliability (or to fix penalties in case of SLA violation). This work opens many perspectives. First, it 
seems possible to improve, relying on different techniques, better approximation ratio in the case of low 
number of resources. Then, the extension to several services is easy: all results can be generalized except 
the lower bound on the energy consumption. Still we can use the lower bound, obtained for resource 
minimization and extend it to the energy minimization. At last, it would be interesting to take explicitly 
into account the memory print of the services, so as to limit the number of different services that a machine 
can handle. This would lead to different solution shapes, by enforcing to limit the number of participating 
physical machines in the deployment of each individual service. 
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